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ON BAYES' THEOREM FOR IMPROPER MIXTURES 1 

By Peter McCullagh and Han Han 

University of Chicago 

Although Bayes's theorem demands a prior that is a probabil- 
ity distribution on the parameter space, the calculus associated with 
Bayes's theorem sometimes generates sensible procedures from im- 
proper priors, Pitman's estimator being a good example. However, 
improper priors may also lead to Bayes procedures that are para- 
doxical or otherwise unsatisfactory, prompting some authors to insist 
that all priors be proper. This paper begins with the observation that 
an improper measure on satisfying Kingman's countability condi- 
tion is in fact a probability distribution on the power set. We show 
how to extend a model in such a way that the extended parame- 
ter space is the power set. Under an additional finiteness condition, 
which is needed for the existence of a sampling region, the conditions 
for Bayes's theorem are satisfied by the extension. Lack of interfer- 
ence ensures that the posterior distribution in the extended space is 
compatible with the original parameter space. Provided that the key 
finiteness condition is satisfied, this probabilistic analysis of the ex- 
tended model may be interpreted as a vindication of improper Bayes 
procedures derived from the original model. 

1. Introduction. Consider a parametric model consisting of a family of 
probability distributions {Pg} indexed by the parameter 6 S 0. Each Pg is 
a probability distribution on the observation space Si, usually a product 
space such as M. n . In the parametric application of Bayes's theorem, the 
family {Pg} is replaced by a single probability distribution P 7T (d9,dy) = 
Pg(dy)ir(d9) on the product space X <Si. The associated projections are 
the prior ir on the parameter space and the marginal distribution 
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for A C <Si- To each observation, y S S± there corresponds a conditional 
distribution P n (d9 \ y), also called the posterior distribution, on O. 

The joint distribution P 7T (d9, dy) has a dual interpretation. The generative 
interpretation begins with 9, a random element drawn from with probabil- 
ity distribution n, the second component being distributed according to the 
model distribution Y ~ Pq, now treated as a conditional distribution given 9. 
In reverse order, the inferential interpretation begins with the observational 
component Y ~ P n (® x dy) drawn from the mixture distribution, the pa- 
rameter component being distributed as 9 ~ P n (- \ y) from the conditional 
distribution given Y = y. The conditional distribution P n (- \y) tells us how 
to select 9 6 in order that the joint distribution should coincide with the 
given joint distribution P n (d9,dy). 

On the assumption that the marginal measure P v {dy) = f & Pg(dy)v(d9) 
is cr-flnite, formal application of the Bayes calculus with an improper prior v 
yields a posterior distribution Q(d9 \ y) satisfying 

P e (dy) v (d8) = P v (dy)Q{d8\y) 

[Eaton (1982), Eaton and Sudderth (1995)]. This factorization of the joint 
measure yields a conditional law that is a probability distribution, in the 
sense that Q(Q \ y) = 1. However, the joint measure is not a probability 
distribution, so the factorization is not to be confused with Bayes's theo- 
rem: it does not offer a probabilistic interpretation of Q{- | y) as a family 
of conditional distributions generated by a joint probability distribution on 
the product space. As a result, some authors reject the Kolmogorov axiom 
of total probability, arguing instead for a nonunitary measure theory for 
Bayesian applications [Hartigan (1983), Taraldsen and Lindqvist (2010)]. 
The goal of this paper is to show how an improper prior may be accom- 
modated within the standard unitary theory without deviation from the 
Kolmogorov axioms. A probability space is constructed from the improper 
measure in such a way that Q{- \y) admits a probabilistic interpretation 
as a family of conditional probability distributions given the observation. 
Section 6 shows that <r-finiteness is not needed. 

It would be inappropriate here to offer a review of the vast literature on 
improper priors, most of which is not relevant to the approach taken here. 
Nonetheless, a few remarks are in order. Some statisticians clearly have 
qualms about the use of such priors, partly because Bayes's theorem de- 
mands that priors be proper, partly because the "degree of belief" interpre- 
tation is no longer compelling, and partly because the formal manipulation 
of improper priors may lead to inferential paradoxes of the sort discussed 
by Dawid, Stone and Zidek (1973). Lindley (1973) argues correctly that 
strict adherence to the rules of probability requires all priors to be proper. 
Even though the Bayes calculus often generates procedures yielding sensi- 
ble conclusions, he concludes that improper priors must be rejected. Many 
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statisticians, including some who interpret the prior as a "degree of belief," 
are inclined to take a less dogmatic view. In connection with Bernoulli trials, 
Bernardo and Smith (1994) (Section 5.2) comment as follows. It is impor- 
tant to recognize, however, that this is merely an approximation device and 
in no way justifies [the improper limit 6~ 1 {1 — B)~ l ] as having any special 
significance as a representation of ll prior ignorance." In subsequent discus- 
sion in Section 5.4, they take a more pragmatic view of a reference prior as 
a mathematical tool generating a reference analysis by the Bayes calculus. 

The purpose of this note is to offer a purely probabilistic interpretation 
of an improper prior, in agreement with Lindley's thesis but not with his 
conclusion. The interpretation that removes the chief mathematical obsta- 
cle is that an improper measure on G is a probability distribution on the 
set of subsets of 0. A proper prior determines a random element 9 € 
with distribution ir, whereas an improper prior v determines a random sub- 
set, a countable collection {Oi} distributed as a Poisson process with mean 
measure v. In the product space x Si, the proper joint distribution P n 
determines a random element (9,Y), whereas the improper distribution P v 
determines a random subset Z C x S\ , a countable collection of ordered 
pairs Z = {{Oi, Yi)}. An observation on a point process consists of a sam- 
pling region A C Si together with the set y = Y n A of events that occur 
in A. It is critical that the sampling region be specified in such a way that 
Y n A is finite, a condition that puts definite limits on v and on the set of 
sampling schemes. Having done so, we obtain the conditional distribution 
given the observation. The standard Bayesian argument associates with each 
point y G Si a probability distribution on 0: the point process argument as- 
sociates with each finite subset y C A a probability distribution on 0* y . 
Despite this fundamental distinction, certain aspects of the conditional dis- 
tribution are in accord with the formal application of the Bayes calculus, 
treating the mixture as if it were a model for a random element rather than 
a random subset. 

2. Conditional distributions. Consider a Poisson process with mean mea- 
sure fi in the product space S = So x S\. Existence of the process is guaran- 
teed if the singletons of S are contained in the cr-field, and [i is a countable 
sum of finite measures, that is, 



Kingman's countability condition, also called weak finiteness [Kingman 
(1993)], is the natural condition for existence because it implies that the 
marginal measures hq{B) = fi(B x Si) for B C So and fJ,i(A) = //(So x A) 
for A C Si are countable. Consequently, the projected processes exist and 
are also Poisson. 



oo 





where /t„(S) < oo. 



n=l 
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Unlike <7-finiteness, countability does not imply the existence of a subset 
AcS such that < n(A) < oo. If such a set exists, the process is said to be 
observable on A. For example, the measure taking the value oo on subsets 
of positive Lebesgue measure in R and zero otherwise is countable, but 
the process is not observable on any subset. Sigma- finiteness is a stronger 
condition, sufficient for existence but not necessary, and not inherited by 
the projected marginal measures [Kingman (1993)]. 

The symbol Z ~ PP(^) denotes a Poisson point process, which is a random 
subset Z C S such that for each finite collection of disjoint subsets A±, . . . , A n 
of S, the random variables #(Z n A±), . . . , #(Z n A n ) are distributed inde- 
pendently according to the Poisson distribution #(Z D Aj) ~ Po(fj,(Aj)). In 
much of what follows, it is assumed that /i(S) = oo, which implies that 
j^Z ~ Po(oo) is infinite with probability one, but countable on account 
of (2.1). Since Z is countable and S is a product set, we may label the 
events 

Z=(X,Y) = {(X i ,Y i ):i = l,2,...}, 

where X C So is a Poisson process with mean measure /io and Y C Si is 
a Poisson process with mean measure The notation Z = (X,Y) implies 
that X C So and Y C Si are countable subsets whose elements are in a spe- 
cific 1-1 correspondence. 

To say what is meant by an observation on a point process, we must 
first establish the sampling protocol, which is a test set or sampling region 
A C Si such that Hi(A) < oo. In this scheme, Sq is the domain of inference, 
so X is not observed. The actual observation is the test set A together with 
the random subset y = Y C)A, which is finite with probability one. Although 
we refer to Si as the "space of observations," it must be emphasized that 
an observation is not a random element in Si, but a finite random subset 
y C A C Si , which could be empty. 

The distinction between a point process and an observation on the process 
is the same as the distinction between an infinite process and an observa- 
tion on that process. An infinite process is a sequence of random variables 
Y = (Yi, Y2, ■ ■ .) indexed by the natural numbers, that is, a random function 
Y:N — > R. An observation consists of a sample, a finite subset AcN, to- 
gether with the response values Y[A] for the sampled units. Likewise, a point 
process is a random subset considered as a random function Y:Si — > {0, 1} 
indexed by the domain Si . An observation consists of a sample or sampling 
region A C Si together with the restriction Y[A] =Y D A of the process to 
the sample. Usually A is not finite or even countable, but the observation is 
necessarily finite in the sense that #(Y n A) < 00. 

Whether we are talking of sequences or point processes, the domain of 
inference is not necessarily to be interpreted as a parameter space: in cer- 
tain applications discussed below, the observation space consists of finite 
sequences in Si = M n , and Sq = R°° is the set of subsequent trajectories. 
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In this sense, predictive sample-space inferences are an integral part of the 
general theory (Section 4.2). 

We focus here on inferences for the X-values associated with the events 
y = Y n A that occur in the sampling region, that is, the subset 

x = X[A] = {Xi-.Yi eA} = {Xi-.Yi G y} 

in 1-1 correspondence with the observation y. In this formal sense, an in- 
ference is a rule associating with each finite subset y C A a probability 
distribution on S^ y . 

Clearly, if y is empty, x is also empty, so the conditional distribution is 
trivial, putting probability one on the event that x is empty. Without loss 
of generality, therefore, we assume that < (Ai(A) < oo, that m = #y is 
positive and finite, and that the events are labeled (Yi, . . . ,Y m ) by a uni- 
form random permutation independent of Z. Given #y = m, the pairs 
(Xi,Yi), . . . ,(X m ,Y m ) are independent and identically distributed random 
variables with probability density n(dxdy) / n\(A) in Sq x A. Thus, the con- 
ditional joint density given Y C\ A = y is equal to 

(2-2) p(dx | y) = fl =fl»(dxi | W ), 

where fi(dx \ y) is the limiting ratio /i(dx x dy)//j,i(dy) as dy i{y}. 

The key properties of this conditional distribution are twofold, condi- 
tional independence and lack of interference. First, the random variables 
X\ , . . . , X m are conditionally independent given Y fl A = y . Second, the con- 
ditional distribution of Xi given y depends only on Yi, not on the number 
or position of other events in A. For example, if two or more events occur 
at the same point (Yi = Yj) the random variables X^Xj are conditionally 
independent and identically distributed given y. The test set determines 
the events on which predictions are made, but beyond that it has no effect. 
In particular, if m = 1, the conditional density of X is p(dx \ y) oc /i(dx \ y) 
regardless of the test set. 

The observability assumption lii(A) < oo is not made out of concern for 
what might reasonably be expected of an observer in the field. On the con- 
trary, finiteness is essential to the mathematical argument leading to (2.2). If 
the number of events were infinite, countability implies that the values can 
be labeled sequentially yi,V2,--- in H correspondence with the integers. 
Countability does not imply that they can be labeled in such a way that the 
infinite sequence is exchangeable. As a result, the factorization (2.2) fails if 

#y = oo. 

The remark made above, that the test set has no effect on inferences, 
is correct but possibly misleading. Suppose that < m < oo and that the 
observation consists of that information alone without recording the par- 
ticular values. If fJ-i(A) = or ^i(A) = oo, no inference is possible beyond 
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the fact that the model is totally incompatible with the observation. If the 
marginal measure is finite on A, the conditional density is such that the 
components of X[A] are independent and identically distributed with den- 
sity fj,(dx x A)/fjLi(A), which does depend on the choice of test set. In the 
context of parametric mixture models with = So , each sequence with dis- 
tribution Pg has probability Pg(A) of being recorded. Thus, before obser- 
vation begins, the restriction to A C S± effectively changes the measure to 
Pg{A)v(d9), which is finite on 0, but depends on the choice of A. 

3. Improper mixtures. Consider a parametric statistical model consist- 
ing of a family of probability distributions {Pg-0 E 0} on the observation 
space <Si, one distribution for each point 8 in the parameter space. Each 
model distribution determines a random element Y ~ Pg. A probability dis- 
tribution 7r on completes the Bayesian specification, and each Bayesian 
model also determines a random element (0,Y) E x S\ distributed as 
Tr(d8)Pg(dy). The observational component is a random element Y E 5i dis- 
tributed as the mixture Y ~ P n , and the conditional distribution given Y = y 
is formally the limit of n(d9)Pg(dy)/ P n (Q,dy) as dy J, {y}. 

A countable measure v such that v{&) = oo does not determine a random 
element 6 E 0, but it does determine an infinite random subset X C 0. 
Furthermore, the joint measure v{d9)Pg{dy) is countable, so there exists 
a random subset Z = (X, Y ) C x S\ , distributed according to the Poisson 
process with mean measure v{d9)Pg{dy). If this interpretation is granted, 
it is necessary first to specify the sampling region A C Si, in such a way 
that P v (A) < oo to ensure that only finitely many events y = Y n A occur 
in A. To each observed event Yi E y, there corresponds a parameter point 
6i E X[A] such that (9i,Yi) E Z. Parametric inference consists in finding the 
joint conditional distribution given Y H A = y of the particular subset of 
parameter values 0\, . . . , 6 m corresponding to the events observed. 

This probabilistic interpretation forces us to think of the parameter and 
the observation in a collective manner, as sets rather than points. Taken lit- 
erally, the improper mixture is not a model for a random element in x S± , 
but a model for a random subset Z = (X,Y) C x S±. If u{Q) < oo, as in 
a proper mixture, it is sufficient to take A = S\ and to record the entire 
subset y C <Si, which is necessarily finite. However, if u{&) = oo, it is neces- 
sary to sample the process by first establishing a test set A C S\ such that 
P U (A) < oo, and then listing the finite set of values y = Y n A that occur 
in A. Generally speaking, this finiteness condition rules out many sampling 
schemes that might otherwise seem reasonable. In the special case where 
#y = 1, X[A] is a random subset consisting of a single point, whose condi- 
tional density at x E is 



(3.1) 



pr{X[A]£dx\y = {y}) 



v(dx)p x (y) 
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where pe{y) is the density of Pq at y. The finiteness condition on A ensures 
that the integral in the denominator is finite, and the occurrence of an event 
at y implies that P v assigns positive mass to each open neighborhood of y. 

Provided that < P U (A) < oo, this purely probabilistic conclusion may 
be interpreted as a vindication of the formal Bayes calculation associated 
with an improper prior. However, the two versions of Bayes's theorem are 
quite different in logical structure; one implies a single random element, the 
other infinitely many. Accordingly, if a statistical procedure is to be judged 
by a criterion such as a conventional loss function, which presupposes a sin- 
gle observation and a single parameter, we should not expect optimal results 
from a probabilistic theory that demands multiple observations and multiple 
parameters. Conversely, if the procedure is to be judged by a criterion that 
allows for multiple sequences each with its own parameter, we should not 
expect useful results from a probabilistic theory that recognizes only one se- 
quence and one parameter. Thus, the existence of a joint probability model 
associated with an improper prior does not imply optimality in the form 
of coherence, consistency or admissibility. For example, in the MANOVA 
example of Eaton and Sudderth (1995), the Poisson point process interpre- 
tation yields the classical posterior, which is incoherent in de Finetti's sense 
and is strongly inconsistent in Stone's sense. 

The observability condition implies that the restriction of P u to A is finite, 
and hence trivially <r-finite. The role of the finiteness condition is illustrated 
by two examples in Sections 4 and 6. For the Gaussian model, P v is countable 
for every n > and cr-finite for n > 2, which guarantees the existence of 
a sampling region if n > 2. For the Bernoulli model, P u is countable for each 
n > but not a-finite for any n. Nonetheless, the finiteness condition for 
observability is satisfied by certain subsets A C {0, l} n for n > 2. 



4. Gaussian point process. 



4.1. Parametric version. Consider the standard model for a Gaussian se- 
quence with independent N(9,a 2 ) components. Let p be a given real num- 
ber, and let the prior measure be v{d0da) = d9da/a p on the parameter 
space = lx R + . For all p, both v and the joint measure on O x W 1 
satisfy the countability condition. Consequently a Poisson point process 
Z = (X, 7)c9x M n exists in the product space. For n > 2 — p, the marginal 
measure P v has a density in M. n 

x _T((n + p- 2)/2)2(P- 3 )/ 2 7r-("- 1 )/ 2 n- 1 /2 



(£?=i(2/*-y) 2 ) (n+p - 2)/2 

which is finite at all points y € M. n except for the diagonal set. Provided that 
n > 2 and n > 2 — p, there exists in M. n a subset A such that Py{A) < oo, 
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which serves as the region of observation. In fact, these conditions are suf- 
ficient for cr-finiteness in this example. To each observation y = Y n A and 
to each event j/£y, there corresponds a conditional distribution on with 
density 



where (f> n (y;8,a) is the Gaussian density at y in R n . The conditional distri- 
bution (2.2) of the parameter subset X[A] C given Y D A = y is a product 
of factors of this type, one for each of the events in y. It should be empha- 
sized here that the information in the conditioning event is not simply that 
y CY, but also that Y contains no other events in A. 

4.2. Nonparametric version. Let N be the set of natural numbers, and 
let S = M N be the collection of real- valued sequences, 



with product a-field £8 . We construct directly in this space a Poisson pro- 
cess Z C S whose mean measure A is uniquely determined by its finite- 
dimensional projections A n with density (4.1). By their construction, these 
measures are finitely exchangeable and satisfy the Kolmogorov consistency 
condition A n +i(^4 xM)= A n (A) for each integer n > and A € S% n . In keep- 
ing with the terminology for random sequences, we say that the point pro- 
cess Z ~ PP(A) is infinitely exchangeable if each A n is finitely exchange- 
able. 

Let S — Si x So, where <Si = M n is the projection onto the first n coor- 
dinates, and So = S is the complementary projection onto the subsequent 
coordinates. Each event z € Z is an ordered pair, so we write Z = (Y, X) C S 
as a countable set of ordered pairs (Yi,XA in which the marginal process 
Y C <Si is Poisson with parameter A n , and X ~ PP(A) has the same dis- 
tribution as Z. Provided that the set A C Si has finite A n -measure, the 
observation y = Y n A is finite. To each event y € y, there corresponds 
an event z = (y,x) 6 Z, so that y = (z±,...,z n ) is the initial sequence, 
and x = (z n+ i,...) is the subsequent trajectory. The conditional distribu- 
tion (2.2) is such that the subsequent trajectories X[A] are conditionally 
independent and noninterfering given Y Pi A = y. For each event y € y, the 
fc-dimensional joint density at x = (x±, . . . ,Xf.) of the subsequent trajectory 
is 



p(9, a | Y n A = y, y £ y) = <f> n {y, 9, a)a p /\ n {y) 



5 = R K ={y = (y 1 ,y 2 ,...):y i GM,i 



(4.2) 



p(dx | Y n A = y, y G y) 



K+k(y,x)dx 



which is the fe-dimensional exchangeable Student t density [Kotz and Nadara- 
jah (2004), page 1] on v = n + p — 2>0 degrees of freedom. 
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For any continuous location-scale model with finite pth moment and im- 
proper prior density proportional to dfida/a p with p > 0, the initial segment 
Y C M 2 is a Poisson process with intensity 

1 



A 2 (y) oc 



yi — 2/2 1^ 



Otherwise if p < the initial segment of length n > 2 — p is a Poisson process 
with intensity 

1 



A n (y) OC 



(Er=i(y*-y) 2 ) (n+p - 2)/2 ' 



The prescription (4.2) extends each event y € 1" to an infinite random se- 
quence in such a way that the set of extended sequences Z C M N is a Poisson 
process with mean measure A. Given Y C ffi 2 , these extensions are condi- 
tionally independent, noninterfering, and each extension is an exchangeable 
sequence. In the Gaussian case, (4.2) is equivalent to the statement that each 
initial sequence with s n ^ is extended according to the recursive Gosset 
rule 



Un+l = Dn + S n £ n i 



n 2 — 1 



n(n + p — 2) 



where y n ,s^ are the sample mean and variance of the first n components, 
and e n ~ t n + p _2 has independent components. The resulting extension is an 
exchangeable sequence whose /s-dimensional joint density at (7/3, . . . , yk+2) 
is A fc+2 (yi,...,yfc + 2)/A 2 (yi,2/2)- 

The Gosset extension is such that the sequence (y n ,s 2 ) is Markovian 
and has a limit. Given a single sequence in the sampling region, the joint 
distribution of the limiting random variables (?/oo;Soo) is 

p{Voo, Soo I Y n A = y, y G y) = 4> n (y; y^, s 00 )s^ p /A n (y), 

which is the posterior density on O as computed by the Bayes calculus with 
improper prior. 



5. Cauchy sequences. Consider the standard model for a Cauchy se- 
quence having independent components with parameter flGRx M + . For 
p > 0, the prior measure v(d9) = d0\ ^2/^2 satisfies the countability condi- 
tion, which implies that a Poisson process X = (X, 7)c8x R n with mean 
measure P v exists in the product space. If < p < n and n > 2, the marginal 
measure in M. n has a density which is finite at all points y € M n whose com- 
ponents are distinct. The density satisfies the recurrence formula 

lim 7ry 2 A ni p(yi, ...,y n ) = A n _i iP _i(yi, . . . ,y n -i)- 

y n ->±oo 
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For integer p > 2, the density is 



(5-1) KAV) 



(_l)(n-p+l)/2 I |n-p 

^ nr - > ; — ; > [n — p) odd; 

(_l)(n-p)/2 _ y r )n-Pl og |j, a _ y r | 



-p ^ / 



(n — p) even; 

where d r = U t ^ r (yt - y r )- For example, A 2 ,i(y) = l/(2|yi - y 2 |) and 

^3,2 2/) 27r| (yi - y 2 )(2/2 - 2/3X2/1 — 2/3)1* 
Spiegelhalter (1985) established the same formula for p = 1 in equation (2.2). 

For n> p, there exists a subset A C R n such that A n (A) < 00, which serves 
as the region of observation. The Poisson process determines a probability 
distribution on finite subsets y C A, and to each point y S y it also associates 
a conditional distribution on with density 

f , 2 , P p (dexdy) _ f n (y,e)9~ p 

1 ' ) An(dy) X n (y) ' 

where f n {y\0) is the Cauchy density at y 6 M n . 

In the nonparametric version with replaced by the conditional dis- 
tribution extends each point y G A to a sequence eR n+fc , with con- 
ditional density X ~ A n+ fe(y, x)/\ n {y). The extension is infinitely exchange- 
able. The tail trajectory of the infinite sequence is such that, if T^:lR fc — > 
is Cauchy-consistent, T n+ k{y,X) has a limit whose density at € is (5.2). 

6. Binary sequences. Consider the standard model for a Bernoulli se- 
quence with parameter space = (0, 1). The prior measure u(d6) = d9/{0(\ — 
6)) determines a Poisson process with intensity ni ^~ 1 (1 — 0) n o(w-i a t (y ; #) 
in the product space S\ x 0. Here S\ = {0, l} n is the space of sequences of 
length n, no(y) is the number of zeros and n\{y) is the number of ones in y. 
The marginal measure on the observation space is 

A n ({ y |) = | r ( n o(y)) r ( n i(y))/ r ( n )> My), My) > o, 

1 00, otherwise, 

which is countable but not cr-finite. Any subset A C S\ that excludes the zero 
sequence and the unit sequence has finite measure and can serve as the region 
of observation. Given such a set and the observation y = Y DA recorded with 
multiplicities, the conditional distribution (2.2) associates with each y G y 
the beta distribution 

Qn 1 ( y yif 1 _ 9 \n {y)-l T f n } 

ft( >|ynA- y , y6y )- r( : i{ J (My)) 

on the parameter space. 
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As in the preceding section, we may bypass the parameter space and 
proceed directly by constructing a Poisson process with mean measure A 
in the space of infinite binary sequences. The values assigned by A to the 
infinite zero sequence and the infinite unit sequence are not determined 
by {A ra }, and can be set to any arbitrary value, finite or infinite. Regardless 
of this choice, (2.2) may be used to predict the subsequent trajectory of 
each of the points y = Y n A provided that A n (A) < oo. In particular, the 
conditional distribution of the next subsequent component is 

pr(y n+ i = l|yn^ = y,yGy) =m(y)/n. 

This is the standard Polya urn model [Durrett (2010)] for which the infi- 
nite average of all subsequent components is a beta random variable with 
parameters (no(y), ni(y)), in agreement with the parametric analysis. 

7. Interpretation. The point-process interpretation of an improper mea- 
sure on G forces us to think of the parameter in a collective sense as a random 
subset rather than a random point. One interpretation is that a proper prior 
is designed for a specific scientific problem whose goal is the estimation of 
a particular parameter about which something may be known, or informed 
guesses can be made. An improper mixture is designed for a generic class of 
problems, not necessarily related to one another scientifically, but all having 
the same mathematical structure. Logistic regression models, which are used 
for many purposes in a wide range of disciplines, are generic in this sense. 
In the absence of a specific scientific context, nothing can be known about 
the parameter, other than the fact that there are many scientific problems 
of the same mathematical type, each associated with a different parameter 
value. In that wider sense of a generic mathematical class, it is not unnatu- 
ral to consider a broader framework encompassing infinitely many scientific 
problems, each with its own parameter. The set of parameters is random 
but not indexed in an exchangeable way. 

A generic model may be tailored to a specific scientific application by 
coupling it with a proper prior distribution tt that is deemed relevant to 
the scientific context. If there is broad agreement about the model and the 
relevance of tt to the context, subsequent calculations are uncontroversial. 
Difficulties arise when no consensus can be reached about the prior. Accord- 
ing to one viewpoint, each individual has a personal prior or belief; Bayes's 
theorem is then a recipe for the modification of personal beliefs [Bernardo 
and Smith (1994), Chapter 2]. Another line of argument calls for a panel 
of so-called experts to reach a consensus before Bayes's theorem can be 
used in a mutually agreeable fashion [Weerhandi and Zidek (1981), Genest, 
McConway and Schervish (1986)]. A third option is to use proper but flat 
or relatively uninformative priors. Each of these options demands a proper 
prior on G in order that Bayes's theorem may be used. 

This paper offers a fourth option by showing that it is possible to apply 
Bayes's theorem to the generic model. Rather than forcing the panel to reach 
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a proper consensus, we may settle for an improper prior as a countable sum of 
proper, and perhaps mutually contradictory, priors generated by an infinite 
number of experts. Although Bayes's theorem can be used, the structure 
of the theorem for an improper mixture is not the same as the structure 
for a proper prior. For example, improper Bayes estimators need not be 
admissible. 

Finiteness of the restriction of the measure to the sampling region is 
needed in our argument. If the restriction to the sampling region is cr-finite, 
we may partition the region into a countable family of disjoint subsets of 
finite measure, and apply the extension subset by subset. The existence of 
a Poisson point process on the sampling region is assured by Kingman's 
superposition theorem. Lack of interference implies that these extensions 
are mutually consistent, so there is no problem dealing with such cr-finite 
restrictions. This is probably not necessary from a statistical perspective, 
but it does not create any mathematical problems because the extension 
does not depend on the choice of the partition of the region. 

8. Marginalization paradoxes. The unBayesian characteristic of an impro- 
per prior distribution is highlighted by the marginalization paradoxes discus- 
sed by Stone and Dawid (1972) and by Dawid, Stone and Zidek (1973). In the 
following example from Stone and Dawid (1972), the formal marginal poste- 
rior distribution calculated by two methods demonstrates the inconsistency. 

Example 8.1. The observation consists of two independent exponential 
random variables X ~ £{9(j)) and Y ~ where 9 and (ft are unknown 

parameters. The parameter of interest is the ratio 9. 

Method 1. The joint density is 



Given the improper prior distribution tt(9) d9d(p, the marginal posterior dis- 
tribution for 9 is 



where z = y/x. 

Method 2. Notice that the posterior distribution depends on (x, y) only 
through z. For a given 9, z/9 has an F2.2 distribution, that is, 



Using the implied marginal prior tt(9) d9, as if it were the limit of a sequence 
of proper priors, we obtain 



pr(dx, dy\6,<t>) = 9<j) 2 e~^ 6x+y) dx dy. 
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which differs from (8.1). It has been pointed out by Dempster and in the 
author's rejoinder [Dawid, Stone and Zidek (1973)], that no choice of ir{9) 
could bring the two analyses into agreement. 

From the present viewpoint, the improper prior determines a random 
subset of the parameter space and a random subset of the observation 
space (R + ) 2 . Under suitable conditions on n, the bivariate intensity 



is finite on the interior of the observation space, so the bivariate process is 
observable. Equation (2.2) associates with each event (x,y) the conditional 
distribution (8.1) in agreement with the formal calculation by Method 1. 
Each event (x,y) determines a ratio z = y/x, and the set of ratios is a Pois- 
son point process in (0, oo). However, the marginal measure is such that 
A Z (A) = oo for sets of positive Lebesgue measure, and zero otherwise. This 
measure is countable, but the marginal process is not observable. Thus, con- 
clusion (8.2) deduced by Method 2 does not follow from (2.2), and there is 
no contradiction. 

Conversely, if the prior measure n{d9)p{d(j)) is multiplicative with /o(R + ) < 
oo and ir locally finite, the marginal measure on the observation space is such 
that 

P u ({a < x/y < b}) < oo 

for < a < b < oo. Thus, the ratio z = x/y is observable, and the conditions 
for Method 2 are satisfied. The point process model associates with each 
ratio < z < oo the conditional distribution with density 



in agreement with (8.2). However, the conditional distribution given (x,y) 



is such that the marginal distribution of 9 given [x, y) is not a function of z 
alone. Once again, there is no conflict with (8.2). 

All of the other marginalization paradoxes in Dawid, Stone and Zidek 
(1973) follow the same pattern. 

Jaynes (2003) asserts that "an improper pdf has meaning only as the limit 
of a well-defined sequence of proper pdfs." On this point, there seems to be 
near-universal agreement, even among authors who take diametrically op- 
posed views on other aspects of the marginalization paradox [Akaike (1980), 
Dawid, Stone and Zidek (1973) and Wallstrom (2007)]. No condition of this 
sort occurs in the point-process theory. However, a sequence of measures fi n 
such that Hn{A) < oo, each of which assigns a conditional distribution (2.2) 
to every y € A, may have a weak limit /i n — > [i such that fJ.(A) = oo for which 
no conditional distribution exists. 




TT 



(9, 4> \ (x,y))(x 6(P 2 e-^ ex+y K(e)p(<P) 
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